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In recent years a popular nonparametric model for coarsened data 
is an assumption on the coarsening mechanism called coarsening at 
random (CAR). It has been conjectured in several papers that this 
assumption cannot be tested by the data, that is, the assumption 
does not restrict the possible distributions of the data. In this paper 
we will show that this conjecture is not always true; an example 
will be current status data. We will also give conditions when the 
conjecture is true, and in doing so, we will introduce a generalized 
version of the CAR assumption. As an illustration, we retrieve the 
well-known result that the CAR assumption cannot be tested in the 
case of right-censored data. 


1. Introduction. When dealing with coarsened data, the coarsening may 
be due to some random effect. A condition was proposed in Heitjan and 
Rubin (1991) on this random effect, called “coarsened at random,” or CAR. 
In their setup the random variable of interest, which in this paper we will call 
Y, takes values in a finite set y. However, instead of observing Y directly, 
we observe a nonempty random set X <zy such that with probability 1, 
Y € X. They then define the CAR assumption as an assumption on the 
possible or allowed conditional distributions of X given Y = y [CAR is a 
modelling assumption, so a class of distributions for {Y,X) is considered]: 

for all A C Y P(A = A\Y = y) is constant in y G A. 

They showed that in this setting, the CAR assumption ensured that the 
randomness of the coarsening could be ignored when making inference on the 
parameter of interest, namely, the distribution of Y. Many papers have since 
appeared generalizing this idea, especially to general sample spaces. We refer 
to Jacobsen and Keiding (1995) and Gill, van der Laan and Robins (1997) 
for a general introduction. Our goal is mainly to discuss the testability of 
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the CAR assumption, that is, does the CAR assumption restrict the possible 
distributions of the data? 

We will start by giving a general model for coarsened data which is very 
close to the one given in Jacobsen and Keiding (1995), but without the 
measurability issues in that paper. We repeat that it is not our main goal to 
extend the notion of CAR to general sample spaces; therefore, we will not 
give an extensive comparison with definitions given in the aforementioned 
papers. We would just like to mention that in practical situations all defini¬ 
tions will lead to more or less the same concept. Furthermore, our notation 
will mostly be similar to that in Pollard (2002), with one notable exception; 
if is a measure on a space and vr is a measurable map from Z io y ^ 
then we denote the image measure on 3? of /x under vr as 7r(/i). 

Let y be the space of the variable of interest Y (e.g., the time of onset 
of a certain disease). The stochastic variable Y is distributed according to 
a probability measure Q. Let ^ be a “hidden” space from which we can 
retrieve Y and the data. To be more precise, the stochastic variable Z G 
Z is distributed according to a probability measure ^ and there exists a 
measurable map Tr:Z ^y such that Y = tt{Z). Furthermore, there exists a 
measurable map 'ip:Z ^ X, where X is the data space, such that X = ip{Z) 
is the observed data. In short, 

(z,/i) ^ {y,Q). 

Ip 

{X,P) 

The measure /x, together with the mappings tt and x/;, contains all the 
information about how the variable of interest Y is coarsened into the data 
X. This definition of coarsened data is more general than the one used by, 
for example. Gill, van der Laan and Robins (1997), where the data must 
consist of sets. However, it is also much easier to find counterexamples to 
the conjecture mentioned in the abstract, to which we will come shortly. 

First, to make things a bit more tangible, let us see how current status 
data fits into our framework; let Y be the time of onset of a certain disease, 
let C be the time of visiting a doctor, generally called the censoring time, 
and define the data X as 

X = (C, 1^Y<C})- 

Then Z = (y, C) (so ^ = [0, oo[ X [0, oo[), 7r(y, C) = y and x/;(y, C) = {C, l{y<c})- 

In Heitjan and Rubin (1991), Gill, van der Laan and Robins (1997), 
Nielsen (2000) and several others, coarsened data consists of sets B, ele¬ 
ments of some cj-algebra B on y, such that Y £ B. Defining Z = y x B, we 
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see that this approach also fits into ours if we have proper conditions on fi: 
we allow all /j, such that Y G B almost surely. Of course, we could also say 
that our data consists of the set 7:{'il^~^{x}) C Y; it is, however, possible (see 
Example 2.1) that knowing x provides more information. In any case, we 
find that our results are more clearly stated in our definition of coarsened 
data. 

Before we can state the CAR assumption, we need some more notation. 
We will restrict ourselves in this paper to dominated models, so we choose 
a fixed and known probability measure /xq on Z. In Gill, van der Laan and 
Robins (1997) the CAR assumption is also defined for the nondominated 
case [CAR(ABS)], but we will get back to this later. Define 

(5o = 7r(/Uo) and Pq = ipino). 

Now we wish to condition on the map vr (or, equivalently, on Y). If Z and y 
are, for example, Polish spaces, this can always be done via a Markov-kernel; 
we define the conditional distribution of Z under /xq given Y = y, denoted 
by fio{dz\y), such that for each bounded measurable function k on Z we 
have 

j^k{z)no{dz) = J^(^J^k{z)yo{dz\y)^7r{yo){dy). 

This is called a disintegration. Of course, we also have that 

yo{{z:7r{z)^y}\y)=0 

for 7r(/xo)-almost all y. 

Definition 1.1 (The CAR assumption). In the notation given above, 
the CAR assumption states that <C /Uq is a possible (or admitted) distri¬ 
bution of Z if and only if 

y{dz) = g o '4^{z) ■ h o TT{z)fio{dz), 

where h is an arbitrary density with respect to Qq and g is a positive mea¬ 
surable function on X such that 

(1.1) J g o il)(^z)go{dz\y) = 1 for Qo-almost all y, 

which is equivalent with 

E^„(y(A)|y) = l. 

This implies that h{y) is the (marginal) density of Y with respect to Qo and 
that the conditional distribution of Z, given Y = y, is given by 

IJ,{dz\y) = go 'il;{z)i2o{dz\y). 
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This loosely means that we assume that given Y, the unknown part by 
which the coarsening mechanism chooses Z (note that no is known!) may 
only be a function of the data. Note that under CAR, we can choose an 
arbitrary density h G L^{Qo), but the measurable function g must be positive 
and satisfy (1.1) [in particular, g G L^{Po)]. This restriction on g does not 
depend on h, however, which gives the set of all possible distributions of Z 
under CAR a product structure. 

It might not be entirely clear why one would want to make such an as¬ 
sumption, but the popularity of the CAR assumption can largely be ex¬ 
plained by the following proposition. First, we define a linear map 

(1.2) S:L\Qo) ^ L\Po) : S{h){x) = E^^{hiY)\X = x) 

(remember that ii Z ^ go, then A ~ Pq and Y ~ Qo)- 

Proposition 1.2. Let g be a distribution of Z that satisfies the CAR 
assumption. This means that there exists g G L^(Po)+ sueh that g{dz\y) = 
g o fi{z)gQ{dz\y). Let h be the marginal density ofY with respect to Qo [so 
Tr{g){dy) = h{y)Qo{dy)]. Then the marginal distribution of X is given by 

fi{g){dx) = g{x)S{h){x)Po{dx). 

This shows that the likelihood of the data factorizes into a relevant fac¬ 
tor [remember that h(Y) as a function of h is the likelihood based on the 
underlying data T, the variable of interest, and note that S is known] and 
a nuisance factor g. Since we can choose any g that satisfies (1.1) and then 
choose an arbitrary density h independent of the chosen g, the overall pa¬ 
rameter space is a product space. So, for example, we know which h would 
maximize the likelihood of the data, without having to know anything about 
the coarsening mechanism (except that it’s CAR, of course). It of course also 
implies lots more good consequences for likelihood-based (and, in particular, 
Bayesian) inference. 

Proof of Proposition 1.2. Let /c be a positive measurable function 
on X. Remember that 

g{dz) = g o fi{z) ■ h o 7T{z)go{dz). 

Then we have 

E^ikiX))=E^fiki^)giX)hiY)) 

= Epfiki^)9iX)E,,ihiY)\X)) 

= EpMX)g{X)S{h){X)). □ 

The CAR assumption as we defined it depends on the choice of go, but 
we do have the following proposition: 
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Proposition 1.3. Let /xq and vq he probability measures on Z such 
that uq satisfies the CAR assumption for {in particular, vq <C ^o)- Then 
a probability measure /x <C pq on Z satisfies the CAR assumption for fj,Q if 
and only if it satisfies the CAR assumption for pq- 


Proof. Since pq satisfies the CAR assumption for hq, we can write 
no{dz) =goo fi’iz)ho o 7r{z)fio{dz) 

such that ho is a density for Qq and Ef^g{go{X)\Y) = 1, which means that 
uoidzly) = goofi{z)fio{dz\y). Suppose g, satisfies CAR for go, so we can write 

g{dz) = gio fi{z)hi o -K{z)go{dz) 

with E^fihiiY)) = 1 and E^figi{X)\Y) = 1. Note that 

hiQo = 7r(/i) <C 7r(po) = hoQo, 

so hi/ho is well defined (0/0 = 0). The same reasoning, but with tt replaced 
with if, gives that gi/go is well defined. Now note that 


and 


E.,^^^{Y)^=E,fihi{Y)) = l 




il 

z go 

f ^ 

.E go 


{fi{z))no{dz\y) 

{'ip{z))go{i^{z))go{dz\y) 


= / gi{'ip{z))go{dz\y) 


= 1 , 


so 

g{dz) = (gi/go) o fi{z){hi/ho) o 7r{z)no{dz) 
satisfies CAR for pq- 

If g satisfies CAR for pq ) we conclude in a completely analogous way that 
g satisfies CAR for go- □ 


This proposition shows that for any go you pick such that a certain coars¬ 
ening mechanism oq satisfies CAR for go (and is, therefore, an element of 
your model), the possible distributions of Z absolutely continuous with re¬ 
spect to uo are the same as when you would have chosen go = no- Therefore, 
a logical choice for go is a generic distribution for Z that you would want 
to have in your model, preferably with an as large as possible support. 
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One can easily verify that our definition of the CAR assumption is equi¬ 
valent to the ones given in Gill, van der Laan and Robins (1997) (for the 
dominated case), Jacobsen and Keiding (1995) and Nielsen (2000), when 
we restrict ourselves to their respective setups (see also the discussion after 
Theorem 3.8). We would like to point out that for the factorization property 
of Proposition 1.2, Gill, van der Laan and Robins (1997) also have to restrict 
themselves to the dominated case. The conjecture made in Gill, van der Laan 
and Robins (1997) is that the CAR assumption does not restrict the possible 
distributions of the data, making it impossible to test whether the CAR 
assumption is fulfilled or not. In fact, they prove this conjecture (in their 
setup) when T is a finite space. In the next section we will give examples 
where the conjecture actually fails, not only in our generalized setup, but 
also in the more restrictive setups. In Section 3 we will give sufficient and 
almost necessary conditions when the conjecture will hold. 


2. Examples. 


Example 2.1. Let T = [0, oo[, ^ = [0, oo[ x [0, oo[ and Z = {Y, C). De¬ 
fine X = iflY, C) = CY. This coarsening mechanism cannot be described as 
in Gill, van der Laan and Robins (1997), for knowing X is not equivalent to 
knowing that Y lies in the set of points compatible with the observation X. 
Now we have to choose /Uq: 

fio{dydc) = e~^ e~'^ dy dc. 


The CAR assumption states that for a possible distribution y, of Z, there 
exist h G L^(Qo) and g G L^{Pq) such that 

fi{dydc) = {g{cy)e~‘' dc)h{y)e~y dy. 


Furthermore, (1.1) tells us that 


gicy)e ‘'dc = l 


Vy > 0. 


But this means that the Laplace transform of g is identically equal to the 
Laplace transform of 1, and, therefore, g = 1- So the possible choices for g, 
are 


g{dy dc) = h{y)e '^e ^dydc, 

where /i is a density with respect to Qo{dy) = dy. Note that C is indepen¬ 
dent of Y with a given distribution, and the distribution of Y is arbitrary. 
A simple transformation of variables gives 

'il){g){dx) = h{y)e~^e~^^'^-dy^ dx. 
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Therefore, X always has a decreasing density with respect to the Lebesgue 
measure on [0,oo[, which shows that in this case the CAR assumption does 
restrict the possible distributions of the data. 

As noted before, the CAR assumption depends on the choice of /Uq- To 
illustrate this, let us choose 


IJ,o{dydc) = {ye dc) ■ e ^ dy. 


Then CAR implies for our (positive) function g that 



g{cy)ye dc = l 


Vy > 0. 


However, this is nothing more than saying that 5 is a density for Pq, since in 
this case Pq is the standard exponential! Clearly, this means that the CAR 
assumption is not testable in this case. However, it is not hard to see that 
in this case S{h) = 1, so all information about Y is lost. As a final remark, 
note that the CAR assumption is only affected by fj,Q through fiQ{dc\y), the 
conditional distribution of C given Y = y, so that choosing a different (but 
equivalent) Qq essentially leaves the CAR assumption unaltered (this also 
follows from Proposition 1.3). 


Example 2.2 (Current status). A much more important example, and 
one that also fits the setups of Jacobsen and Keiding (1995) and Gill, van 
der Laan and Robins (1997), is that of current status data. We will consider 
the bounded case, that is, all times considered fall in [ 0 , 1 ], but it is not hard 
to see that this is not a real restriction. So define T = [0,1], Y is the time of 
interest, C G [0,1], the censoring time, and Z = {Y,C), so Z = [0,1] x [0,1]. 
Define 


V’(T, C) — (C, l{y<c}), 

so A = [0,1] X {0,1}. The interpretation is that one knows the time one 
visited the doctor, and the doctor can say whether someone is sick or not. 
Choose fio{dydc) =dydc. Then (1.1) implies that we can choose positive 
g G L^{Po) such that 

[ g{c,l{y<c})dc=l V0<y<l. 

J 0 

However, this says that 

[ g{c,0)dc+ [ g{c,l)dc = l V0<y<l. 

Jo Jy 

Differentiating with respect to y shows that 

9{c,0)=g{c,l) V0<c<l. 
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So CAR implies that the only allowed models for n are 

fi{dydc) = g{c)h{y) dcdy, 

where h and g are densities on [0,1]. This is, of course, equivalent with saying 
that Y and C have to be independent. 

Consider the following subsets of X: 

Ai = {(x, 1 ); X G [ 0 , ^]} and yl 2 = {(x, 0 ):x G 1 ]}. 

Let V be the set of all probability distributions on X and define for every 
PgV, 


HP) = {P{A,),P{A2)). 

Clearly, 

= {{ai,a2) G [0,1]^ :ai + 02 < 1}. 

Now suppose the CAR assumption holds, so Y and C are independent. Then 
we know that 


F{X G Ai) = P(C < i and Y<C) 

< P(C' < i and T < 5 ) 

= P(C'< i) •P(T< i). 

Similarly, 

P(X G A 2 ) < P(C > i) • P(T > i). 

This means that 

P{XeA,)-P{XeA2)<^. 

So, if we define "PcAR as the set of all possible distributions of the data under 
the CAR assumption, then 

^(^CAr) C {( 01 , 02 ) G [0,1]^ :oi +02 < 1 and oi • 02 < ^}. 

Since this is a proper subset of <h('P), we conclude that in the case of current 
status, it is possible to find a distribution of the data that contradicts the 
CAR assumption. In a future paper we will discuss what would be a good 
way to test the CAR assumption in this important example. Here we would 
like to note a few things. In the first place, it is possible that the data 
distribution is an element of PcAR, even though the CAR assumption is not 
fulfilled: one easily checks that this happens when 
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is a continuous distribution function (i.e., nondecreasing), where f{y\c) is 
the conditional density of Y given C = c. This shows that it is impossible 
to verify CAR by the data; it is just sometimes possible to reject the CAR 
assumption. 

In the second place we note that the convex hull of all independent den¬ 
sities of {Y, C) is weakly dense in the set of all densities, and, therefore, the 
convex hull of 'PcAR is weakly dense in V. This means that you cannot test 
the CAR assumption with one linear test function. In particular, it shows 
that the model for the distribution of the data under CAR is not convex. 

As a third remark, we would like to point out to the reader that although 
this example fits in the setup of Gill, van der Laan and Robins (1997) for 
CAR on general sample spaces, it does not fit in their setup for finite spaces, 
not even when we restrict Y and C to finitely many possible values. This is 
because the observed sets are all of the form {Y < C} or {Y > C}, and it is 
essential in their setup that the CAR assumption allow distributions such 
that all possible nonempty subsets of y might be observable. See also the 
discussion after Theorem 3.8. 

Finally, it is not hard to show that under the assumption CAR(ABS) 
defined in Gill, van der Laan and Robins (1997), one can find all possible 
distributions of X by assuming that Y and C are independent, but can have 
any distribution (not necessarily dominated). This means that the argument 
given here also shows that GAR(ABS) restricts the possible distributions of 
the data X. We do not think that by restricting ourselves to the dominated 
case we throw away an important part of the possible distributions of X 
under CAR(ABS). 

3. General conditions for the testability of CAR. In this section we will 
give our most abstract definition of coarsened data, but we will first look at 
the map S: L^{Qo) L^{Po). We will repeat its definition; 

(3.1) S{h){x)=E,,{h{Y)\X = x). 

If we denote the duality between L^-functions and L“-functions by (•, •), we 
would like to remind the reader that the dual map 

S*:L°^{Po)^L^{Qo) 

is defined such that 

{Sih),k) = {h,S*{k)). 

Note that for k G L“(Po)) 

(5(h), A:) = (M A)i7^o ) • 

Proposition 3.1. The linear map S: L^{Qo) ^ L^{Po) defined above 
has the following properties: 
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1. 5(1) = 1 and S*{1) = 1, where S* denotes the dual of S. 

2. 5 is positive, that is, h>0^ S{h) > 0. 

3. ||5|| = 1, where || • || denotes the operator-norm. 

Proof. Properties 1 and 2 are obvious. It is also clear that 

||/i 0 7 r||i = ||/i||i and \\k o ^p\\oo = \\k\\oo 

[here we use Qq = and Pq = ^p{no)], which shows that ||5|| < 1. Since 

5(1) = 1, ||5|| = 1. □ 

The importance of the map 5 is seen most clearly when we translate (1.1): 

e^Mx)\y) = i. 

It is well known that 

S*{g){y)=EMX)\Y = y), 

so this means that the CAR assumption restricts our choice for g (remember 
that gotp the conditional density of Z given Y = y, for all y) to all positive 
g such that 

S^i9) = Y 

This will lead us to a new definition of CAR. 

Definition 3.2. Let T be a stochastic variable of interest, defined on a 
space y, and let Qq be a probability measure on y. Let X be the data-space 
and Pq a probability measure on X. We define a coarsening (of T) as a 
linear map 

S:L\Qo)^L\Po) 

such that; 

1. 5(1) = 1 and 5*(1) = 1, where 5* denotes the dual of 5. 

2 . 5 is positive, that is, h>0^S{h)> 0. 

We thank one of the referees for pointing out the following; every coars¬ 
ening 5 can be obtained through a conditional expectation, as we did in 
the original definition of CAR. To see this, define Z = y x X. We define a 
probability measure /xq on T x df in the following way: let A C T and B C X 
be measurable such that 1a G L^{Qo) and 1b G L^{Po). Then we define 

/xo(A xB) = Ep,{1b{X)S{1a){X)). 

This extends to a probability measure on y x X such that for h G L^{Qo) 
and k G L^{Pq), 


E,,{k{X)h{Y)) = Ep,{k{X)S{h){X)). 
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It is easy to check that Qq and Pq are the marginals of Y, respectively, X, 
and that 


S{h){x)=E^,{h{Y)\X = x). 

From this it is clear that 

S*{k){y)=E,,{k{X)\Y = y), 

so the map S* is in itself a coarsening of X. This is the content of the next 
lemma, which we will prove without using the auxiliary measure yo- In fact, 
we believe the map S to be the most convenient object to study, which is 
why we will not refer to yo again. 

Lemma 3.3. Let S: L^{Qo) ^ L^{Po) he a coarsening. Then: 

1. S is eontinuous and ||5|| = 1. 

2. The dual map S* is also defined and continuous from L^{Pq) to L^{Qq) 
{in fact, S* is a coarsening itself). 

Proof. Let h G L^{Qo). Then —\h\ < h < \h\, so |5'(/i)| < 5'(|/i|). Now, 

\\S{\h\)\\ = {S{\h\),l) 

= m,i) 


This, together with S'(l) = 1, proves the first statement. 

Let g G L^(Po)+- There exists {gn} L°°(Po)+ such that gn] 9- Clearly, 
S* is also positive, so S*{gn) T h for some h G L^(Qo) [note that {h, 1) = lim | 
{S*{gn),l) =limt {gn,S{l)) = {g,l)]. Also, if h' G L^{Qo), then {h,h') = 
{g,S{h')), so h does not depend on the sequence {gn}- Define h = It 

is trivial to check that with this definition, S* is in itself a coarsening. □ 

Define for a probability measure v, 

An = [h G L^{iy) :h>0 and {h, 1) = 1}, 
the set of densities with respect to n. 

Definition 3.4 (CAR). Let 

S:L}Qo)^L}Po) 

be a coarsening of a random variable Y. The CAR assumption now states 
that the distribution of the data belongs to the set 

VcAR = {g-S{h):hGAQ^^,gGL^{Po)+ and S*{g) = l}. 
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First we should note that "PcAR C Ap^, because 
{g-Sih),l) = {h,S*{g)) = l 

and S is a positive map. In this new definition we also retain the product 
structure of the likelihood of the data. The remark after Definition 3.2 shows 
that the only difference with the previous definition is that we restrict the 
distributions of the data X, instead of restricting the distributions of the 
hidden variable Z. 

It is clear that the question of testability of the CAR assumption amounts 
to checking whether the set VcAR is dense in Ap^. Before we consider this 
question, we want to note the following; define 

M = {5(/i) G Ap^i/iG AqJ. 

Then M is a convex subset of Ap^. Now in analogy to the polar set of a 
subspace of a linear space, we define 

M° = {ge L\Po)+ : (V/r G M){h,g) = 1}. 

Since for all g G L^(Po)+) S*{g) = 1 is equivalent to 

(V/iGAgJ {S{h),g) = l, 

we get that 

M-M° = Rcar. 

Encouraged by this observation, we define 

M°° = {hG L\Po)+ : (V 5 G M°){h,g) = 1}. 

Figure 1 shows the situation when Pq has a support of 3 points (so we 
can view Ap^ as a triangle) and M is a convex subset of Ap^. 

As you can see, we should view M°° as an extension of M to the edges 
of App. The following proposition, together with Lemma 3.9, substantiates 
Figure 1: 

Proposition 3.5. Let M be an arbitrary subset of Ap, with P some 
probability measure. 
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1. M C M°° C Ap. 

2 . = 

Proof. 1. The first inclusion is obvious from the definitions. For the 
second one, it is enough to note that 1 G M°, because M C Ap. 

2. Clearly, (M°°)° C M°. Let geM°.Uh£ (M°)°, then {h,g) = 1 (be¬ 
cause c/G M°), so 5 G □ 

Since M ■ M° C M°° ■ M° C Ap, a natural necessary condition on M for 
M ■ M° to be dense in Ap would be M°° C M. The following proposition 
more or less substantiates this statement. We do have to caution the reader 
that in principle M° and M°° need not be closed sets, since the linear 
functional {h,g) is not continuous on L^{P) if 5 G L^(P)_|_ \L°°{P). 

Proposition 3.6. Let M be a subset of Ap such that M°° M. Then 
there exist h G M°° and e > 0 such that for all f G M ■ M°, 

Proof. Choose h G M°° \ M. Then there exists e > 0 such that for all 
h G M, ||/i- — /i|| >e. It is a well-known inequality for the Kullback-Leibler 
divergence [see, e.g., van der Vaart (1998), page 62] that 

J -'^og(^^^hdP>^\\h-hf. 

Now let / G M •M°, so / = hg, with liG M and g G M°. Note that {g,h) = 1, 
since h G M°°. So 

I -log(^^^hdP = J -\og(^^^hdP + J -log{g)hdP 
> 9hdP^ 



We have to point out that this proposition does not state that, under the 
assumption that M°° M, M ■ M° is not dense in Ap. We were not able 
to prove that statement in general. However, it does indicate that M • M° 
is not dense in Ap, and in specific examples it should not be too hard to 
actually prove it. 
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Example 3.7 (Current status). As we have seen already, we consider a 
time of interest E G [0,1], a censoring time C G [0,1], and the data consists 
of {C^ls^Y<c})- We take 

Qo{dt) = dt and Po{dx,6) = x dx ■ l{s=i} + — x) dx ■ l{S=o}- 

It is easily seen that our map S is equal to 

S{h){x,5) = - [ h{t)dt-l{s=i} + r;^— [ h{t) dt ■ l{s= 0 }- 

X JO 1 — X JX 

Remember that M = 5(Aqp), so for all h G M, we have that xh{x,l) is 
increasing in x. Now choose 

h{t) = lp<l/3} — l{l/3<t<2/3} + 3 • l{t>2/3}- 

Then {h, 1) = 1 and S{h) > 0, so S{h) G M°°, but xS{h){x, 1) = Jq h{t) dt is 
not increasing in x, so S{h) ^ M. It was this observation that led us to find 
the test described in Example 2.2. 

The statement we would like to prove for M C Ap is that M ■ M° is dense 
in Ap if and only if M°° C M. However, we were not able to prove it in this 
generality, nor find a counterexample to it. Only when P has finite support 
were we able to prove the statement in full generality; 

Theorem 3.8. Let P be a probability measure with finite support and let 
M C Ap such that there exists ho G M with /iq > 0. Then M ■ M° is dense 
in Ap if and only if M = M°°. 

Proof. Let M / M°°. Since we are now in the situation that L^{P) = 
L^{P), it follows that M°° is closed, so we always have M C M°°. According 
to Proposition 3.6, there exist h G M°° and e > 0 such that for all / G M • M °, 

j — \og(^j^hdP > £. 

Since ho G M°°, we can choose /i > 0 [note that eho + (1 — e)/i G M°°, for 
all 1 > e > 0]. Since {/ > 0:/ G Ap} is an open subset of Ap and since 
/ 1 -^ / — \og{f /h)hdP is continuous on this set (so, in particular, continuous 
at h), we conclude that there exists rj > such that for all f G M ■ M°, 

||/-h||>?7_ _ 

Now let M = M°°. Choose f G Ap with / > 0. Since M is compact and 
h^ f — log{h/ f)f dfi is lower semi-continuous (see also Lemma 3.11), there 
exists hG M that minimizes this Kullback-Leibler divergence. It is also clear 
that h> 0, since otherwise the Kullback-Leibler divergence would be -|-oo 
(here we use that ho G M). Now let h G M. Since h> 0, there exists e > 0 
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such that when |A| < e, h + X{h — h) >0. This means that h + \{h — h) G 
M°° = M, because clearly {h + X(h — h),g) = 1 for all g G M°. The function 

has a minimum at A = 0 for A G ] — e, —e[, so the derivative at A = 0 (which 
exists!) must be zero. A simple calculation yields 

J ih-h)^dP = 0. 

This proves that {h,fjh) = 1 for all h G M, so f/h G M°. Therefore, / G 
M ■ M°. It is not hard to see that if /i„ ^ h, then hn - f /h^ f, which proves 
that M ■ M° is dense in Ap. □ 

This theorem is very much like the theorem in Section 2 of Gill, van der 
Laan and Robins (1997) and also the proof is very similar. To show how their 
theorem (apart from the uniqueness statement) follows from Theorem 3.8, 
we translate their setup into ours. Let T be a finite space with m points 
and let X = V{y) \ {0}, the collection of all nonempty subsets of y. The 
idea is that one observes X C y such that Y G X. To reformulate the CAR 
assumption used in Gill, van der Laan and Robins (1997), we define Z = 
{{y, A) :y G A C y} and go as the rescaled counting measure on Z, assigning 
mass to each element of Z. Obviously, we define iT{y,A) =y and 

V’(y, A) = A, so Qo = 7r(^o) is the rescaled counting measure on y (assigning 
mass 1/m to each point) and Pq = ipigo) satisfies 

^■ 0 ( 0 ) = ;^ i^Acy), 

where | A| denotes the number of elements of A. Now we define S: L^{Qo) 
L^{Po) such that for all h G L^{Qo) and A G A, we have 

(3.2) S{h){A) = E^,{h{Y)\X = A) = jL^ h{y). 

' ' y£A 

It follows immediately that for g G L^{Po) and y Gy, we have 

S-(9)(!,) = 2l-’"X9(T. 

A3y 

The CAR assumption now states that the likelihood of X with respect to 
Pq equals g ■ S{h), where h is an arbitrary density with respect to Qo and 
g G L^(Po)+ such that S*{g) = 1. If we would follow Definition 1.1, we would 
restrict the possible distributions g of Z = {Y,X) such that 

g{X = A|y = y)= g{A)go{X = A|F = y) = 2^-^g{A)l{y^A}- 
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It is not hard to see that this is indeed equivalent to the definition of Gill, 
van der Laan and Robins (1997) used for finite sample spaces. So, in fact, 
they use a very specific form of the map S; even in finite sample spaces our 
setup is much less restrictive. Finally, to conclude that in this case CAR 
cannot be tested, we use Theorem 3.8 to see that we only need to check that 
when we define 

M = {S{h):heAQ,}, 

we have M°° = M. We will use the following lemma. 

Lemma 3.9. Let P be a measure with finite support and let M C Ap. 
Then 

M°° = (M) n Ap. 

Here (M) denotes the linear span of M. 

Proof. Let h £ (M) n Ap, so h = J2 with A* G M and hi £ M such 
that h > 0 and (h, 1) = 1. This means that = 1- If S' ^ then for 
every i {hi,g) = 1, so we conclude that {h,g) = 1, and, therefore, h £ M°°. 
We have shown that (M) n Ap C M°°. 

Now suppose h£ Ap and h ^ {M). Since L^{P) is finite dimensional, there 
exists (j) £ L^{P) such that for all h £ (M), we have (h, fi) = 0 and {h, fi) > 0. 
We can choose (j) such that \fi\ < 1. Define g = 1 + Then g >0 and for 
h£ M we have {h,g) = 1, so g £ M°. However, {h,g) > 1, so h ^ M°°. Since 
M°° C Ap, we have shown that M°° C (M) n Ap. □ 

When M = {S{h) :h£ Aq^,}, it is easy to check that (M) n Ap^ = {S{h) :h£ 
L^{Qq), {h, 1) = l,S{h) > 0}. Therefore, whenever A is a finite set, M°° = M 
is equivalent to 

(3.3) 5(h) >0 ^ 3h>0:S{h)=S{h) [VhGL^(Qo)]- 

For the map 5 we were considering, this follows trivially from (3.2) [note 
that 5(h)({?/}) =h{y)]. 

The problem with extending the proof of Theorem 3.8 to general P is 
twofold. First of all, M will not be compact in general, which makes it dif¬ 
ficult to find a minimum for the Kullback-Leibler divergence. The second 
problem is concluding that the derivative is zero: even if we find a mini¬ 
mum (in some compactification), we can only conclude that the directional 
derivative we used in the previous proof is negative, but not necessarily zero. 
To solve these problems and come up with a theorem that can be used for 
practical situations, we will use the map 5 more extensively by putting re¬ 
strictions on it. But first we will discuss an extension of the Kullback-Leibler 
divergence to solve the noncompactness problem. 
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Definition 3.10. Let E = {L°°{P)y, the (strong) dual of the Banach 
space L°^{P). Let / G L^(P)+. Define ioi hG E, h> 0, 

KLf{h) = supjx: - ^ E = l}- 

We would like to make a few remarks. As E is the dual of an ordered 
Banach space, it is itself ordered in the obvious way; /i > 0 if for all (p G 
L“(P)+, {h,4>) > 0. Furthermore, L^(P) C E. We also have that the unit 
ball of E is weakly compact (Banach-Alaoglu), and if /i G £'+ (i.e., h is 
positive), we have that ||/i.|| = (/i, 1). Since KLj is the supremum of weakly 
continuous functions on it is itself weakly lower semi-continuous. If 
M C Ap, then M (the closure of M in the weak topology, seen as a subset of 
E) will be weakly compact, because C and for all h G , {h, 1) = 1, 
so it is a weakly closed subset of the unit ball. This means that KLj will 
attain its minimum on for some h G M^. 

From the theory of ordered vector lattices [see, e.g., Schaefer and Wolff 
(1999), Chapter V] it follows that (P) is a band in E. This means that each 
h G Ej^ can be uniquely decomposed as h = P h±, where /i// G L^(P)+ 
and h± > 0 is disjoint from L^(P), so for all / G L^(P)_|_, we have that 
inf(/ip,/) = 0 (compare this to the decomposition of a measure into a part 
that is absolutely continuous to some other measure and a part which is 
disjoint from this other measure). We have the following lemma, the proof 
of which is deferred to the Appendix. 

Lemma 3.11. Let /G L^(P)+. Then, in the notation introduced above, 
for all h G £'+, 

KLf{h) = KLf{h//)= JdP 

Now we will consider a coarsening S:L^{Qq) L^{Pq). Define Eq^ = 
(L°°(Qo))^ and Ep^ = (L°°(Po))^- By considering the dual map of S*, we can 
extend S: Eq^ Ep^. Clearly, S will be continuous for the weak topologies 
on Eq^ and Ep^ (as well as for the strong topologies) and S will be a 
positive map. Define M = 5(Aqq). Since Aq^*^ C F'qq is weakly compact, 
M" = 5(Aqq‘^)(C F^Po). When h G Eg^^p, we can consider h// G L^{Qq)p 
as well as S{h)// G L^(Po)+- la general, we can only deduce that S{h)// > 
S{h/j), since h = hji + h± and S{h/i) G L^(Po)+- 

Before we can state our main result, we need two assumptions. The first 
is the analogue of M = M°°, or equation (3.3) which we discussed before, 
but slightly stronger: 
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(Al) For all h' G -Fqo,+ such that S{h')j/ > 0, there exists h G -Eqo,+ with 
Sih) = S{h') andh//>0. 

How we will use assumption (Al) is stated in the following lemma: we 
say that hi G L^(P)+ dominates /i 2 G L^(P)+ (notation: /i 2 < hi), if there 
exists R> 0 such that /12 < Rhi. 

Lemma 3.12. Suppose ho G £^Qo,+ such that /igy/ > 0. Let h G L^{Qo)+- 
Then there exists a sequence hn G L^{Qo)+ such that hn < /iq,// and /i„ t h. 

Proof. Define / = /iq,//- Let h G L^((5o)+- Define 

hn — h ■ A u. 

Since / > 0, /in j /i. Furthermore, /in < • /, because on {/ > 1/n}, /in < n. 

□ 

Since /in T ^ implies that S{hn) T 'S'(/i), (Al) can be seen as an approx¬ 
imation property for M = S'(Aqp). We will need a similar approximation 
property for M° = {g £ L^{Po)+: S*{g) = 1}: 

(A2) For all g G L^(Po)+ such that S*{g) = 1 and 5 > 0, there exists a 
sequence gn G L^{Po)+ such that S*{gn) = \\gn\\i • 1, g-n < 5 and gn T 1- 

We will first show how these two assumptions are used to prove our main 
theorem, after which we will show in two examples how one checks these 
assumptions. 

Theorem 3.13. Let S: L^{Qo) L^(Po) be a coarsening satisfying (Al) 
and (A2). Then the CAR assumption cannot be tested, so PcAR is dense in 

^Po- 

Proof. Define M = ,S(AqJ(c ApJ and M° = {g £ L\Po)+: S*{g) = 
1}. We have to prove that M ■ M° is dense in App. Let / G App such that 
/ > 0 and / is bounded; the set of all these functions is clearly dense in App, 
so it is enough to prove that f £ M ■ M°. 

Clearly, KLj(l) < + 00 , so the infimum of KLj on <Z Ep^ is finite 
(because 1 G M). As noted before, since KLj is weakly lower semi-continuous 
and is weakly compact, KLj attains its minimum somewhere in , let 
us say in A: G . Using Lemma 3.11, we can see that k// > 0, since otherwise 
KLj(A:) = KL = +00 (here we use that / > 0). Since = S{Aqq ^), 
we can choose ho £ F'qp,+ with S{ho) = k and hog/ > 0 [here we use (Al)]. 

Let h £ Aqq. According to Lemma 3.12, there exists a sequence /i„ G 
L^{Qo)+ with hn < hoj/ and hn] h [and, therefore, S{hn) ] S{h)]. Define 
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dn = {hn, 1) (so ttn j 1) and fix n. Because /i„ < /io,//; there exists 0 < e < 1 
such that for all A g]— e,e[, 

ho,// + Hd-n^hn — ho,//) > 0. 

We conclude that ho + \{a~^hn — ho) G ^, and so A: + A(a“^5(/i„) — fc) G 
Therefore, for all A G]—£,£[, 

/ - )/dP„ > / - log( A)/dPo. 

which by differentiating at A = 0 implies 

{S{hn),^^=dn. 

Since S{hn) T 'S'(h), we conclude for every /i G Aq^, that {S{h),f/k//) = 1, 
which proves that //h// G L^(Po)+ and that S*{f /k//) = 1, so f /k// G M°. 

We would like to conclude that h// G M, but we only know that k G . 
We do know, however, that for all g' G M°, 

(3.4) (%,<?') <1. 

For if we choose a sequence (/n G h/°°(Po)+ such that t then S*{gn) < 
'S'*((/') = 1, so 

(%>£/') = limt {kii,gn) < lim t {k,gn) < 1- 

Now we can use (A2). Define g = jjkn ^ M°. Clearly, > 0, so there exists 
a sequence gn G L^(Po)+ with S*{gn) = Ibnili • 1, such that gn<g and gn T 1- 
Define bn = Ibnili T 1- There exists £ > 0 such that for all A g] —e,e[, g + 
^{bn^gn — g) > 0, so g + X{bn^gn — g) G M°. Since we have (3.4) and {k//,g) = 
1, we conclude that {kji,gn) = bn, so (A://, 1) = 1. But this means that k = kii, 
since kjj < k and (A:, 1) = 1. So A: G L^{Po) and k is the weak limit [with 
respect to the duality with L°°(Po)] of functions in M. However, L°°(Po) is 
the dual of L^(Po) and M is convex, so the weak closure of M in L^{Po) 
equals the strong closure M, which means that k ^ M. Now choose {km} G 
M such that ||A: — A:m||i ^0. We know that g-km^ Ap^ and g ■ k = f £ Ap^. 
This means that \/gkm and \/^ are positive elements of the unit sphere of 
L^(Po). Since the unit ball is weakly compact in L?‘{Po), we can choose a 
weakly converging subsequence of gkm}, let us say ^Jgkm^ 4'i for some 
4 in the unit ball. This means, in particular, that for any 4 £ L^iPo), 

{4,4) = lim {'4gkm„,4) = lim {'^^,4Vg) = {\^,4\4g)- 

n—^oo n—^oo 

The last equality follows from the well-known fact that the Hellinger metric 
induces the same topology on Ap^ as the L^-norm, so \/Av))) ^ Vk in L‘^{Po), 
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and the fact that 'ipy/gG L‘^{Po). Since L°°{Pq) is dense in L‘^{Pq), we have 
shown that cj) = \/^. This means that every weakly convergent subsequence 
has the same limit \/^, which in turn proves that \/gkm converges weakly 
to \/^. Now note that if for some (j) G L‘^{Pq), we have that ||(/>||2 = 1, then 
a neighborhood base for the L^-topology on the unit ball around cj) is given 
by 

= jV' G L^{Pq) : llV’lb < 1 and (V’,0) > 1 - 

since one easily checks that for any if) G Un, ||'0~';^ll2 — 2/n. This means that 
if {V'n} is a sequence in the unit ball converging weakly to (/>, then ^ 
in L‘^{Pq). Through this we conclude that y/gkm in L‘^{Po), which 

implies that gkm gk in L^{Pq). So k ■ g = f ^ M ■ M°, which proves that 
T’car is Ti-dense in Ap^, and, hence, the CAR assumption is not testable. 
□ 


We wish to stress that in our opinion the only natural (necessary and 
sufficient) condition on S for VcAR to be dense in App is equation (3.3): 

S{h)>0 3h>0:S{h)=S{h) [V/igL^(Qo)]- 

This is illustrated by Figure 1 and we have not been able to find counterex¬ 
amples to this claim. The stronger condition (Al) and condition (A2) were 
necessary to make our proof work, but must be seen as regularization condi¬ 
tions. We know of examples where (Al) and/or (A2) fail, but we still have 
the result that CAR cannot be tested. In these examples, the main ideas of 
the proof of Theorem 3.13 still work, but the details are a bit different. 

We will try to illustrate the theorem by two examples, which we will 
discuss in detail. 

Example 3.14 (Missing data). Let T G T be the variable we wish to 
observe, distributed according to Qq. However, sometimes we can observe Y 
directly and sometimes the observation is missing, which we will denote by 
saying that our observation is f. To make things precise, we define our data 
space X = y U {f}. Furthermore, we will use a hidden space to define our 
coarsening S: define Z = y x {0,1} and the map il>:Z^X as i^{y, 1) = y 
(Y is not missing), 'ip{y,0) = f (T is missing). Choose go = QqX (bJo + ^^i), 
so one possible CAR distribution is that each observation has probability ^ 
of being missing, independently of T ~ Qo- It also means that Pq = V’(mo) = 
^ly ■ Qo + Then for h G L^{Qo), we dehne S{h) G L^(Po) as follows: 

S{h){y) = E^,{h{Y)\X = y)= h{y) (for y G T) 

and 

S{h){\) = E,,{h{Y)\X = t) = EqMY)). 
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It is not hard to check that for g G L^{Po), 

S*{g){y) = EMX)\Y = y) = ^{y) + 

so indeed S'*(l) = 1, which shows that S' is a coarsening. Then 

^CAR = {g ■ S{h)-.he AQg,g> 0, S*{g) = 1}. 

Since S*{g) = 1 implies that g{y) = 2 — ^((f) for Qo-almost all y, we see that 
we get all distributions in VcAR by allowing Y to be distributed according 
to an arbitrary density h with respect to Qq and assuming that each obser¬ 
vation has an arbitrary probability p = ^g{\) to be missing, independently 
of Y. 

Now we would like to check assumptions (Al) and (A2). In this case 
(A 2 ) is trivial, because if 5 G L^(Po) and g > 0 such that S*(g) = 1, then 
0 < min (2 — ( 7 (t), 5 (t)) < 5 , so 1 < Assumption (Al) is also not so hard 
to check, since if we restrict S(h) to T, we get that S(h) = ^h, seen as 
elements of L^(Qo)- This shows that S{h)p = so S{h)p > 0 clearly 

implies hp >0. Theorem 3.13 now states that the CAR assumption cannot 
be tested in this case, so VcAR is dense in Ap^. Clearly, in this simple 
example it is very easy to directly verify that, in fact, "PcAR = Ap^. 

Example 3.15 (Right-censored data). Let y =]0,1[, Qo{dt) = dt on y 
and y be a time of interest distributed according to a density with respect 
to Qq. All that follows can be easily generalized to an arbitrary measure 
on an open subset of ]0,oo[, at the cost of some notational difficulty. Let 
C g] 0, 1[ be a censoring time and let the data (A, A) consist of 

(A, A) = V’(y, C) “"Af (y A C, l{y<c}). 

We will construct our coarsening S as follows: define fiQ = dtdc on y x]0,l[. 
Then define Pq = '4>{po) as a probability measure on the data-space X. One 
easily checks that 

Po{dx, d) = (1 - x) dx ■ l{ 5 =i} + {l- x)dx ■ l{ 5 =o}- 

Now define for h G L^{Qq), 

S{h){x,5) = E^,{h{Y)\{X,A) = {x,5)). 

This is just saying that Sih) is the density of the distribution of the data 
with respect to Pq when Y and C are independent, Y distributed according 
to h{t) dt and C distributed according to dc. Therefore, one easily calculates 

1 P 

h{t)dt-1^3=0}. 


S{h){x,S) =h{x) ■ 
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Clearly, 5(1) = 1 and S is positive. Furthermore, for all h G L^{Qq) and 

{S{h),g)pQ = f {1 — x)h{x)g{x,l) dx + [ [ h{t)g{x,0) dtdx 

Jo Jo Jx 

= J {1 — t)g{t,l)h{t) dt + J g{x,0) dx^h{t) dt 

= y (^(1 - t)g{t, 1) + y aix, 0) dx^ h{t) dt, 


so we see that 


S*{g){t) = il-t)g{t,l) +J g{x,0)dx. 


Hence, 5*(1) = 1, so 5 is indeed a coarsening. Define M = 5(AQg) and 
M° = {5 G L^(Po)+ : S*{g) = 1}. Let h G Ag^ and g G M°. Since S*{g) = 1, 




Because 5 > 0, we have that Jq g(x, 0) dx < 1. If Jq g(x, 0)dx = 1 and we let 
C be distributed according to g'(x,0) and Y according to h, we can easily 
check that the density of '4)iY,C) with respect to Pq is exactly S{h) ■ g, so 
"PcAR contains all data distributions one gets if Y and C are independent 
and dominated by the Lebesgue measure. If we allow C to be distributed 
according to a subdensity (i.e., just saying that the censoring time has a 
positive probability of being bigger than the largest possible value for Y), 
then we get all of "PcAR- 

We would now like to check assumptions (Al) and (A2). Denote by 
S{h)^s^iy the restriction of S{h) to {5 = 1}. Clearly, for h G Eq^, 5(/i)|5=i} = 
(1 — y) • h] here {1 — y) • h acts on 4>{y) G L°°{Qo) as follows: 

((1 -y)-h, (j){y)) = {h, (1 - y ) • (piy)). 

To check (Al) it is enough to conclude that h// > 0 whenever [{l — y)-h]// >0 
and h>0. However, in that case {1 — y) ■ h<h (because ||1 — y||oo < 1), so 
0<[{l-y)-h]// <h//. 

Now let g G M°, y > 0. Define 

gnit,0) = l{g(t,0)>l/n}- 

Clearly, gn{t,0) < ng{t,0). Define A„ = Jq gn(x,0) dx and 
gn{t,l) = gn{x,0)dx'^. 
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Then we have that 

1 /•! 

= YZrj gn{x,0)dx 

< g{x, 0 ) dx 

9 {x, 0 )dx- g{x,0)dx^ 

- T^(‘“/o* 

= ng{t,l). 

So gn ^ g- By construction we have that S*{gn) = • 1. Furthermore, since 

g > 0, gn(t,0) t 1- This implies that gn{t, 1) T 1, so gn T 1- This proves that 
assumption (A2) is also satisfied. Theorem 3.13 now states that the CAR 
assumption cannot be tested in the case of right-censored data. We wish to 
remark that this in itself is not a new result, but merely an illustration of 
Theorem 3.13. 


APPENDIX 

In this Appendix we will give the proof of a lemma which is a bit technical. 
We repeat some notation: define for a probability measure P the space 
E = {L°°{P))'. This is an ordered vector space and L^{P) is a band in 
E, which means that each h G E+ can be uniquely decomposed as h = 
h// + h±, where /i// G T^(T’)+ and h±>0 is disjoint from P, so for each / G 
L^(P)+, inf(/, h±) = 0. According to Schaefer and Wolff [(1999), Chapter V, 
Theorem 1.5] this is equivalent to saying that for each (j) G and each 

e > 0 , there exists a decomposition (j) = 4>i + 4 > 2 , </>i > 0 , 02 > 0 , such that 
(/i_L,0i) -|- (/, 02 ) < £• For convenience, we repeat the definition of KLj for 
/ G L^{P)+. Define for h G E+, 

KLf{h) = supj^ - log( if, ^^) ■ e E = 1 

Lemma A.l. Let f G L^(P)+. Then, in the notation introdueed above, 
for all h G Ej^, 

KLj{h)=KLf{h//)= J -log(^^)/dP. 

Proof. The second equality is well known for the Kullback-Leibler di¬ 
vergence [see, e.g., Pinsker (1964), Section 2.4] and can be proved using 
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standard techniques like monotone classes. As for the first, since h> hji, 
it is clear that KLj(/i//) >KLf{h) (—log is a decreasing function). As¬ 
sume KLj(/i) < -|-oo. We would like to make an important observation; if 
4>i = and we decompose each (/ij = then Jensen gives us 



We only need to consider (jii such that (/, (/>*) > 0 [we define log(l/0) -0 = 0]. 

We also know that for each such ())*, {hj/,4>i) > 0. For if not, we could de¬ 
compose 0* = (/>iq such that {h, 4>i^i) is arbitrarily small and (/, > 

{f,4’i)l2, which by (A.l) would imply that KLj(/i) = +oo. 

So consider (j)i>0 with {hii,(j)i) > 0 and l{/>o} ^ SILi 4’i < 1- Let e > 0. 

Because inf(/i_i_,/) = inf(/i_i_,/i//) = 0, we can find a decomposition (/ij = 

+ for each i such that (/r//,(/>i - = {hii,4)i^2) < <5, {fAi - = if, 4’i, 2 ) < S 

and {h±,4i,i) < S. Here we can choose J > 0 such that 


-log 


2=1 


(/l// + h±,4i,l 
{f,4i,i) 


{f,4i,i) >55“ log 


2=1 


{h/^, 4i 
if, 4i) 


if, 4i) - e 


and, noting that {h, 1) > 0, since KLf{h) < + 00 , 



{f,4i,2) > -£• 


This last inequality implies that 




All in all, we can conclude that 


n 2 

KLf{h) > EE -log 

i=lj=l 


{h, 4i,j 


> E - log 


2=1 


(/, 4i,j 

{hii,4 
if, 4i, 


if, 4i,j 


{f,4i) -2e. 


This proves that KLf{h) > KLf{h//). □ 
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